FRECLE Mining: Discovering Frequent Semantic Tree Cluster Sequences from Historical Tree Structured Data
نویسندگان
چکیده
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. Existing techniques focus on finding “structural” patterns and ignores the “semantics” that may be associated with the subtrees. In this paper we proposal an algorithm to mine a novel pattern called frequent semantic tree cluster sequences (FRECLE), which captures the frequent sequential association between different semantics of tree-structured data. Given a semantic tree sequence database, the algorithm first categorizes each semantic tree to a semantic cluster. Next, FRECLE patterns are discovered from the semantic cluster sequences by adopting an existing frequent sequential pattern mining algorithm. FRECLE patterns are beneficial in applications where the knowledge of semantic association is significant, such as XML query caching, prefetching XML data, and web users clustering. Specifically, we show how our proposed FRECLE mining framework can be used for designing optimal XML query cache replacement strategy. Finally, by reporting the performance of our algorithm and caching strategy through extensive experiments with both synthetic and real datasets, we show the effectiveness and usefulness of FRECLE mining.
منابع مشابه
Mining of Users’ Access Behaviour for Frequent Sequential Pattern from Web Logs
Sequential Pattern mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. The task of discovering frequent sequences is challenging, because the algorithm needs to process a combinatorially explosive number of possible sequences. Discovering hidden information fro...
متن کاملEfficient Substructure Discovery from Large Semi-structured Data
By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML data [23] has been available on intra and internet. These electronic data are heterogeneous collection of ill-structured data that have no rigid structures, and often called semi-structured data [1]. Hence, there have been increasing demands for automatic methods for extracting usef...
متن کاملDiscovering Minimal Infrequent Structures from XML Documents
More and more data (documents) are wrapped in XML format. Mining these documents involves mining the corresponding XML structures. However, the semi-structured (tree structured) XML makes it somewhat difficult for traditional data mining algorithms to work properly. Recently, several new algorithms were proposed to mine XML documents. These algorithms mainly focus on mining frequent tree struct...
متن کاملDevelopment of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism
Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...
متن کاملDevelopment of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism
Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007